← Back to portfolio

Overview

This project was a redesign of John Snow's cholera map into a modern, interactive visualization built primarily using python. I first explored no-code tools but shifted back to python to achieve a visualization that would convey the impact of multiple deaths per household. This goal was achieved by creating a pseudo-3D bar graph using plotly, where the height of the bars represents the number of deaths and the data is overlaid onto a geojison map for necessary geographic context. The final deliverable successfully demonstrates the feasibility of transforming a historical static graphic into an interactive visualization.


Introduction¶

In this notebook, I am going to learn more about John Snow's cholera map and gradually redesign the map using modern techniques. The original map is an invaluable tool, often cited as a foundational example of spatial analysis, data visualization, and the birth of modern epidemiology. I will go through the work done by my professor, Dr. Gabby Resch, to have a more comprehensive understanding of John Snow's story and then revamp the famous map mainly using python.

John Snow's Cholera Map¶

No description has been provided for this image

John Snow, a British physician in the mid-19th century, is widely recognized as a pivotal figure in the history of medicine and the founding of modern epidemiology. His critical contribution was made during the 1854 cholera outbreak in the Soho district of London. Contradicting the prevailing Miasma Theory (that disease was spread by foul air), Snow employed a meticulous process of data collection and spatial analysis. He created a celebrated map that visually documented the location of every cholera fatality. This innovative visualization revealed a distinct and compelling clustering of deaths around the public water pump on Broad Street. By empirically linking the incidence of the disease to a contaminated water source, Snow provided irrefutable evidence for the waterborne transmission of cholera, leading to the removal of the pump handle. This intervention is considered a monumental success in public health, establishing the methodology of mapping disease patterns and solidifying the principle of evidence-based environmental health policy.

No description has been provided for this image

Work Done by Professor Dr. Gabby Resch¶

First, I will make sure the given codes run well from JupyterLab.

In [1]:
# Import the necessary libraries
import pandas as pd
import folium
from shapely.geometry import Point, Polygon
In [2]:
# Read in the data
deaths = pd.read_csv('data/deaths.csv')
pumps = pd.read_csv('data/pumps.csv')
In [3]:
# Create 'locations' variables by subsetting only Latitude and Longitude from the datasets
locations_deaths = deaths[['latitude', 'longitude']]
locations_pumps = pumps[['latitude', 'longitude']]

# Transform the dataframes to list of lists
deaths_list = locations_deaths[['latitude', 'longitude']].values.tolist()
pumps_list = locations_pumps[['latitude', 'longitude']].values.tolist()
In [4]:
# plot the data on the map of London
map = folium.Map(location=[51.51333282562678, -0.1368429489745336], tiles='Cartodb Positron', zoom_start=17,
                width=800, height=600)

for point in range(0, len(locations_deaths)):
    folium.CircleMarker(deaths_list[point], radius=8, color='black', fill=True, fill_color='black', opacity = 0.4).add_to(map)
map1 = map

for point in range(0, len(locations_pumps)):
    folium.Marker(pumps_list[point], popup=pumps['pump_name'][point]).add_to(map1)

# Display the map
map1
Out[4]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [5]:
#display the data in clusters
from folium.plugins import FastMarkerCluster
FastMarkerCluster(data=list(zip(deaths['latitude'].values, deaths['longitude'].values))).add_to(map)
folium.LayerControl().add_to(map)
map
Out[5]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [6]:
#the clusters is stylized
for index, row in deaths.iterrows():

    folium.CircleMarker(location=(row["latitude"],
                                  row["longitude"]),
                        radius= row['death_count']*10,
                        color='#FA8072',
                        fillOpacity=0.9,
                        fill=True).add_to(map)
map
Out[6]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [7]:
# make a heatmap with a watercolor background!
base_map = folium.Map(location=[51.5132119,-0.13666], tiles='https://watercolormaps.collection.cooperhewitt.org/tile/watercolor/{z}/{x}/{y}.jpg',
                      attr='Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under CC BY SA.', zoom_start=16,
                     width=800, height=600)

for i,row in pumps.iterrows():
    folium.Marker([row['latitude'],row['longitude']], popup=row['pump_name']).add_to(base_map)
heat_data = [[row['latitude'],row['longitude']] for index, row in locations_deaths.iterrows()]

from folium import plugins
from folium.plugins import HeatMap
HeatMap(heat_data).add_to(base_map)
base_map
Out[7]:
Make this Notebook Trusted to load map: File -> Trust Notebook

My Work¶

The existing maps provided the inspiration for me to seek out new visualization approaches for John Snow’s data. My first choice was Datawrapper This no-code platform was selected specifically for its efficiency in generating professional-looking graphs. I utilized the Symbol map feature, overlaying my data onto the platform's embedded "United Kingdom » London » Westminster » MSOA" map. After uploading the coordinates for both deaths and pumps, I augmented the dataset with two new columns for enhanced labeling and visual refinement. While the resulting map clearly displayed the spatial correlation between death occurrences and pump locations—and made it easy to differentiate the Broad Street pump—its usefulness was constrained. The underlying Datawrapper base map omitted key features like street networks, limiting its value to a simple locational overview.

Flourish was considered for its inherent capacity to render three-dimensional maps. Nevertheless, the effort required to produce impactful 3D visualizations, such as preparing the requisite GeoJSON files, was deemed prohibitive. Consequently, the decision was made to allocate the available time toward further development in Python. Despite lacking a formal programming background, I successfully dedicated time to comprehending and making minor modifications to the provided notebook, and I am motivated to pursue more advanced development.

To produce a similar heatmap, folium function HeatMap() may be used directly to deaths_list without any alteration of data.

In [8]:
map2 = folium.Map(location=[51.5132119,-0.13666], tiles='CartoDB positron', zoom_start=17,
                 width=800, height=600)

heatmap = HeatMap(deaths_list) # HeatMap() function can be used directly
heatmap.add_to(map2)
for point in range(0, len(locations_pumps)):
    folium.Marker(pumps_list[point], popup=pumps['pump_name'][point]).add_to(map2)
map2
Out[8]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The previous heatmaps and cluster maps, while informative, failed to fully convey the human impact of the cholera outbreak. A heatmap's aggregated data doesn't resonate in the way John Snow's original map did, where individual lines—representing lives lost within a household—provided a powerful, symbolic story. To replicate this personal resonance and draw the viewer's immediate attention, it's intuitive to use a visualization that literally "sticks out." Therefore, a 3D bar graph is proposed. Although such graphs are commonly created using tools like Power BI or Tableau, I still aim to specifically explore and implement the 3D bar visualization using Python.

First, a graph with 3D bars representing the nubmer of total deaths in one location (i.e., household) is produced.

In [9]:
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np

# Load the data
deaths = pd.read_csv("data/deaths.csv")
pumps = pd.read_csv('data/pumps.csv')

# Groupby latitude and longitude and sum death_count
deaths_grouped = deaths.groupby(['latitude', 'longitude']).agg(
    total_deaths=('death_count', 'sum')
).reset_index()

# Prepare data for 3D bar plot
x_data = deaths_grouped['latitude'].values
y_data = deaths_grouped['longitude'].values
z_data = deaths_grouped['total_deaths'].values

px_data = pumps['latitude'].values
py_data = pumps['longitude'].values


# Create a figure and a 3D subplot
fig = plt.figure(figsize=(12, 10))
ax = fig.add_subplot(111, projection='3d')

# Set bar dimensions
dx = 0.0001  # Width of bars in x-direction (latitude)
dy = 0.0002  # Depth of bars in y-direction (longitude)
dz = z_data  # Height of bars (total_deaths)
zpos = np.zeros_like(dz) # Start z position at 0
zpos1=np.zeros_like(px_data)

dx1 = 0.0002  # Width of bars for pumps 
dy1 = 0.0003 # Depth of bars for pumps 
dz1=np.ones_like(px_data) # for height of bars for pumps for visualization


# Normalize the death counts for coloring
norm = plt.Normalize(z_data.min(), z_data.max())
colors = plt.cm.plasma(norm(z_data))

# Plot the 3D bars for deaths
ax.bar3d(x_data, y_data, zpos, dx, dy, dz, color=colors, shade=True)

# Plot the 3D bars for pumps
ax.bar3d(px_data, py_data, zpos1, dx1, dy1, dz1*0.6, color='green', shade=True)


# Set labels and title
ax.set_xlabel('Latitude', labelpad=10)
ax.set_ylabel('Longitude', labelpad=10)
ax.set_zlabel('Total Deaths', labelpad=10)
ax.set_title('3D Bar Chart: Total Deaths at Each Location', y=1.05)

# Add a color bar
m = plt.cm.ScalarMappable(cmap=plt.cm.plasma)
m.set_array(z_data)
cbar = fig.colorbar(m, ax=ax, shrink=0.6, aspect=15)
cbar.set_label('Total Deaths')

# Adjust the view angle for better visualization
ax.view_init(elev=75, azim=135)

#ax.axis('off')
# Save the plot
#plt.savefig('total_deaths_3d_bar')
plt.show()
No description has been provided for this image

To me, the "sticking bars" are powerful, successfully conveying the impact of multiple lives lost within a single household - a depth that the 2D heatmap simply could not capture. However, the visualization was not interactive, which limited further exploration. To overcome this, I decided to try 2D Plotly, to create a visualization that combines the spatial impact of a 3D bar chart with the flexibility of a 2D scatter map.

Two distinct trace types may be overlaid: using a Scatter trace to representing the top of the bars and using a Line trace to provide the visual sense of height.

In [10]:
import plotly.graph_objects as go
import pandas as pd
import matplotlib.pyplot as plt

deaths = pd.read_csv("data/deaths.csv")
pumps = pd.read_csv('data/pumps.csv')

# Groupby latitude and longitude and sum death_count
deaths_grouped = deaths.groupby(['latitude', 'longitude']).agg(
    total_deaths=('death_count', 'sum')
).reset_index()

"""
# Prepare data for 3D bar plot
x_data = deaths_grouped['latitude'].values
y_data = deaths_grouped['longitude'].values
z_data = deaths_grouped['total_deaths'].values

px_data = pumps['latitude'].values
py_data = pumps['longitude'].values
"""

x = deaths_grouped['latitude'].values
y = deaths_grouped['longitude'].values
z = deaths_grouped['total_deaths'].values

# Define map boundaries (slightly outside the data extent)
lat_min, lat_max = x.min() - 0.002, x.max() + 0.002
lon_min, lon_max = y.min() - 0.002, y.max() + 0.002

# Create a grid for the Z=0 plane (the map area)
grid_points = 10 
lat_grid = np.linspace(lat_min, lat_max, grid_points)
lon_grid = np.linspace(lon_min, lon_max, grid_points)
lat_mesh, lon_mesh = np.meshgrid(lat_grid, lon_grid)
z_plane = np.zeros_like(lat_mesh) # All Z values are 0

# Colors for the vertical lines
norm = plt.Normalize(z.min(), z.max())
colors = plt.cm.plasma(norm(z))
hex_colors = [f'#{int(c[0]*255):02x}{int(c[1]*255):02x}{int(c[2]*255):02x}' for c in colors]

# Create the figure
fig = go.Figure()

# Add the Z=0 Plane 
fig.add_trace(go.Surface(
    x=lat_mesh,
    y=lon_mesh,
    z=z_plane,
    surfacecolor=np.zeros_like(lat_mesh), 
    colorscale=[[0, 'lightgrey'], [1, 'lightgrey']], 
    showscale=False,
    opacity=0.7,
))


# Add the simulated 3D "bars" (points for the top)
fig.add_trace(go.Scatter3d(
    x=x,
    y=y,
    z=z,
    mode='markers',
    marker=dict(
        size=2,
        color=z,
        colorscale='plasma',
        opacity=0.8
    ),
    hovertemplate = '<b>Latitude:</b> %{x:.6f}<br>' +
                    '<b>Longitude:</b> %{y:.6f}<br>' +
                    '<b>Total Deaths:</b> %{z}<extra></extra>',
    name='Total Deaths'
))

# Add vertical lines for the "bars"
for i in range(len(x)):
    fig.add_trace(go.Scatter3d(
        x=[x[i], x[i]],
        y=[y[i], y[i]],
        z=[0, z[i]],
        mode='lines',
        line=dict(color=hex_colors[i], width=10),
        showlegend=False
    ))

# Customize the layout
fig.update_layout(
    title='Interactive 3D "Bar" Plot',
    width=800,  
    height=800,
    scene = {
        'xaxis_title':'Latitude',
        'yaxis_title':'Longitude',
        'zaxis_title':'Total Deaths',
        'aspectmode': 'manual',
        'aspectratio': dict(x=1, y=1, z=1)
    }
)

fig.show()

I realized the visualization would not have the right impact unless the "bars" were grounded on a recognizable map. To solve this, I found the geographical data for Westminster and downloaded it as a GeoJSON file (topo_E09000033.json) from the Martin's UK GeoJSON repository (https://martinjc.github.io/UK-GeoJSON). The next step is to overlay the pseudo-3D bar graph directly onto this map, alongside the original pump locations, to provide the essential spatial context.

In [11]:
import json
import geopandas as gpd
from shapely.geometry import Polygon, MultiPolygon
import pandas as pd
import plotly.graph_objects as go
import numpy as np
import matplotlib.pyplot as plt

# Load the TopoJSON file
topo = json.load(open('data/topo_E09000033.json'))
scale = topo['transform']['scale']
trans = topo['transform']['translate']

# Decode all arcs: apply delta-encoding and transform to coordinates
decoded_arcs = []
for arc in topo['arcs']:
    x = y = 0
    coords = []
    for dx, dy in arc:
        x += dx
        y += dy
        # apply scale and translate to get actual lon/lat
        lon = trans[0] + x * scale[0]
        lat = trans[1] + y * scale[1]
        coords.append((lon, lat))
    decoded_arcs.append(coords)

# Helper to get the coordinates for an arc index (reversed if negative)
def get_arc_coords(idx):
    if idx >= 0:
        pts = decoded_arcs[idx]
    else:
        pts = decoded_arcs[~idx][::-1]  # ~idx gives the correct index for negative value
    return pts

# Build Shapely geometries from the TopoJSON objects
geoms = []
for geom in topo['objects']['E09000033']['geometries']:
    if geom['type'] == 'Polygon':
        # Exterior ring
        exterior_arcs = geom['arcs'][0]
        exterior = get_arc_coords(exterior_arcs[0]).copy()
        # Concatenate remaining exterior arcs (skip first point to avoid duplicates)
        for arc_idx in exterior_arcs[1:]:
            pts = get_arc_coords(arc_idx)
            exterior.extend(pts[1:])
        # Interior rings (holes), if any
        holes = []
        for hole_arcs in geom['arcs'][1:]:
            hole = get_arc_coords(hole_arcs[0]).copy()
            for arc_idx in hole_arcs[1:]:
                pts = get_arc_coords(arc_idx)
                hole.extend(pts[1:])
            holes.append(hole)
        geoms.append(Polygon(exterior, holes))
    elif geom['type'] == 'MultiPolygon':
        parts = []
        for poly_arcs in geom['arcs']:
            # poly_arcs is a list of rings for this polygon
            ext = get_arc_coords(poly_arcs[0][0]).copy()
            for arc_idx in poly_arcs[0][1:]:
                pts = get_arc_coords(arc_idx)
                ext.extend(pts[1:])
            hole_list = []
            for hole_arcs in poly_arcs[1:]:
                hole = get_arc_coords(hole_arcs[0]).copy()
                for arc_idx in hole_arcs[1:]:
                    pts = get_arc_coords(arc_idx)
                    hole.extend(pts[1:])
                hole_list.append(hole)
            parts.append(Polygon(ext, hole_list))
        geoms.append(MultiPolygon(parts))

# Create GeoDataFrame
gdf = gpd.GeoDataFrame({'geometry': geoms}, crs="EPSG:4326") 
In [12]:
# Load data
deaths = pd.read_csv("data/deaths.csv")
pumps = pd.read_csv("data/pumps.csv")

# Aggregate deaths by unique coordinates
death_counts = deaths.groupby(['longitude', 'latitude']).size().reset_index(name='count')



# Create 3D bar map
fig = go.Figure()

# Base map outline
for i, geom in enumerate(gdf.geometry):
    
    # Check if the geometry is a MultiPolygon
    if geom.geom_type == 'MultiPolygon':
        # If it's a MultiPolygon, iterate over its individual parts
        polygons = list(geom.geoms)
    else:
        # If it's a single Polygon, wrap it in a list to allow uniform processing
        polygons = [geom]
        
    # Loop through all individual Polygons (either the single one or all parts)
    for poly in polygons:
        # Extract and convert coordinates from the exterior ring of the Polygon
        x_coords, y_coords = poly.exterior.xy
        x = list(x_coords)
        y = list(y_coords)
        z = [0] * len(x_coords)
            
        # Add the 3D trace for the exterior boundary
        fig.add_trace(go.Scatter3d(
            x=x,
            y=y,
            z=z, 
            mode='lines',
            line=dict(color='gray', width=2),
            name='Boundary',
            # Only show legend for the first overall trace
            showlegend=(i == 0 and polygons.index(poly) == 0) 
        ))


x1 = death_counts['longitude'].values
y1 = death_counts['latitude'].values
z1 = death_counts['count'].values

# Add the simulated 3D "bars" (points for the top)
fig.add_trace(go.Scatter3d(
    x=death_counts['longitude'],
    y=death_counts['latitude'],
    z=z1*0.0005,
    mode='markers',
    marker=dict(
        size=3,
        color=death_counts['count'],
        colorscale='plasma',
        opacity=0.8
    ),
    hovertemplate = '<b>Latitude:</b> %{x:.6f}<br>' +
                    '<b>Longitude:</b> %{y:.6f}<br>' +
                    '<b>Total Deaths:</b> %{text}<extra></extra>',
    text=death_counts['count'],
    name='Total Deaths at the Location'

))

# Add vertical lines for the "bars"
for i in range(len(x1)):
    fig.add_trace(go.Scatter3d(
        x=[x1[i], x1[i]],
        y=[y1[i], y1[i]],
        z=[0, z1[i]*0.0005],
        mode='lines',
        line=dict(color=death_counts['count'][i], colorscale='plasma', width=10),
        showlegend=False
    ))


# Pumps as scatter markers
fig.add_trace(go.Scatter3d(
    x=pumps['longitude'],
    y=pumps['latitude'],
    z=[0.00001]*len(pumps),
    mode='markers',
    opacity=0.6,
    marker=dict(size=6, color='red', symbol='circle'),
    name='Pumps'
))

# Layout
fig.update_layout(
    title='3D Cholera Death Distribution near Broad Street (John Snow Map)',
    width=800,  
    height=800,
    scene=dict(
        xaxis_title='Longitude',
        yaxis_title='Latitude',
        zaxis_title='Number of Deaths',
        aspectmode='data'
    ),
    margin=dict(l=0, r=0, b=0, t=40)
)

fig.show()

Conclusion¶

The current iteration of the map, while achieving the goal of displaying a pseudo-3D bar graph using Plotly, still presents many opportunities for refinement. Specifically, the base map itself lacks street-level detail, which diminishes the historical context that was so vital to John Snow's original work. Furthermore, the visual presentation of the pseudo-3D bars could be enhanced to be more aesthetically pleasing and impactful.

Despite these ongoing challenges, the notebook demonstrates the feasibility of recreating and improving a cornerstone of epidemiological history using the versatility of Python for data storytelling.

Personal Notes

I readily understood the foundational code, but when I had to actually move into advanced development in Python, like trying to build those interactive visualization features, it proved significantly challenging because I just don't have that formal programming background.


← Back to portfolio